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Parametric  statistical  inference  may  be  said  to  be  concerned  with 
statistical  inference  of  idealized  parameters  from  ideal  data.  Huber  (1977), 
p.  1,  writes:  "The  traditional  approach  to  theoretical  statistics  was  and 
is  to  optimize  at  an  idealized  parametric  model.  " 

Robust  statistical  inference  may  be  said  to  be  concerned  with 
statistical  inference  of  idealized  parameters  from  semi-ideal  data  (by  the 
use  of  methods  which  are  insensitive  against  small  deviations  from  the 
ideal  assumptions).  Huber  (1977),  p.  3,  writes:  the  robust  approach  to 
theoretical  statistics  assumes  "an  idealized  parametric  model,  but  in 
addition  one  would  like  to  make  sure  that  methods  work  well  not  only  at  the 
model  itself,  but  also  in  a neighborhood  of  it.  " 

Exploratory  data  analysis  may  be  said  to  be  concerned  with 
statistical  inference  from  non-ideal  data(often  by  seeking  re-expressions 
(transformations)  of  the  data  that  will  make  it  more  ideal).  Exploratory 
data  analysis  helps  pose  the  well-posed  statistical  questions  to  which 
classical  parametric  statistics  provides  answers. 


♦Research  supported  by  the  Army  Research  Office  (Grant  DA  AG29-76-0239). 
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This  paper  provides  an  overview  to  a new  general  approach  to 
statistical  data  analysis  and  parameter  estimation  which  could  be  called 
the  quantile  function  approach.  The  aims  of  descriptive  statistics  (to 
graphically  summarize  and  display  the  data)  are  obtained  by  Quantile- Box 
plots  of  the  sample  quantile  function.  The  aims  of  "goodness  of  fit"  are 
obtained  by  fitting  smooth  quantile  functions  to  the  sample  quantile  function. 

The  aims  of  parameter  estimation,  especially  robust  estimation  of  location 
and  scale  parameters,  are  attained  by  regression  analysis  of  the  sample 
quantile  function.  (The  goal  of  a statistician  in  analyzing  a batch  of  data 

X , . . . , X should  be  both  "estimation  of  parameters"  and  "goodness  of  fit". 

1 n 

By  "goodness  of  fit"  is  meant  fitting  of  the  observed  sample  probabilities  by 
a smooth  probability  law.  ) 

Quantile  functions  are  defined  in  section  2.  Window  estimators  of 
location  and  scale  parameters  are  defined  in  section  3;  their 
equivalence  to  L-estimators  is  discussed  in  section  4.  A conjectured  ex- 
pression is  given  in  section  5 for  the  asymptotic  variance  of  window  estimators. 
New  approaches  being  developed  for  non-parametric  probability  law  modeling 
are  mentioned  in  section  6;  quantile  box-plots  are  introduced  in  section  7. 
Section  8 discusses  location  and  scale  parameter  estimation  using  trimmed 
samples.  Robust  regression  is  the  subject  of  section  9.  A new  definition 
of  statistics  is  proposed  in  section  10. 

To  carry  out  in  practice  robust  estimation  of  location  parameters 
this  paper  proposes  computing  means  which  adapt  to  the  ends  (by  "ends" 


* 1 V J 

U y 


A 

0 


one  means  the  tail  character  of  the  distribution  of  the  data).  Three  such 
methods  are  given  in  the  paper: 

(1)  Iteratively  reweighted  estimators  with  weight  function 

1 2-1 

w(x)  = (1  + ^x  ) for  suitable  choices  of  m (section  3); 

(2)  Maximum  likelihood  estimation  omitting  extreme  order  statistics 
where  the  percentage  of  values  omitted  is  determined  from  the  goodness  of 
fit  of  the  corresponding  smooth  quantile  functions  (section  8); 

(3)  Adaptive  L-estimation  of  location  and  scale  parameters  using 
autoregressive  estimators  of  density-quantile  functions  (section  8). 

A fourth  method  of  robust  location  and  scale  parameter  estimation  is: 

(4)  Quantile  box-plot  diagnostics  which  indicate  that  mid- summaries 
and  mid- scales  are  equal  enough  to  provide  naive  estimators  of  location 


and  scale  (section  7). 
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2.  Quantile  Function 

The  quantile  function  Q(u),  0 su  « 1 of  a random  variable  X is  the 
inverse  of  its  distribution  function  F(x)  = P(X  Sx),  The  precise  definition 
of  Q is: 


Q(u)  = F *(u)  = inf  £x:  F(x)  au}  . 

Given  a sample  X X , we  denote  the  sample  distribution 

1 n 

function  by  F(x),  -=>  <x  < « ; it  is  defined  by 

F(x)  = fraction  of  X . • . , X^  6 x . 

The  Sample  Quantile  Function 

Q(u)  = F *(u)  = inf  [x:  F(x)  a u3 

can  be  computed  explicitly  in  terms  of  the  order  statistics  X(l)<  X(2)<  **• 
< X.  (which  are  the  values  in  the  sample  arranged  in  increasing  order): 

(n) 


The  foregoing  definition  of  Q(u)  is  a piecewise  constant  function. 

It  is  more  convenient  to  define  Q(u)  as  a piecewise  linear  function.  Divide 
the  unit  interval  into  2n  subintervals.  For  u = (2j  - l)/2n  define 


For  u in 


Q (^J)  = x(j)  • j = 1.2.  • . -n  . 


ILlJ.  s u s;  ll±l  , j = 1,2 n-1  , 

2n  n 


define  Q(u)  by  linear  interpolation;  thus  for  u in  this  interval 


Q(u)  = "(u-^-1)  X(j+1)+n  - u)  X,; 


2n 


(j) 


In  particular 


+ ixu> 


The  population  median  is  Q(0.  5)  . The  sample  median  is  Q(0.  5). 
Our  definition  of  Q(u)  has  the  merit  that  Q(0.  5)  is  the  usual  definition  of 
the  sample  median: 


Q(0.  5)  = X.  ...  if  n = 2m  + 1 is  odd, 
(m+1) 


= zr  (X,  . + X.  L,.)  if  n = 2m  is  even. 

2 1 (m)  (m+1 ) 
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The  asymtotic  distribution  of  Q(u)  satisfies: 
is  asymptotically  normal,  with  mean  0 and  variance 
where  fQ(u)  denotes  the  probability  density  function 
evaluated  at  x = Q(u)  ; in  symbols. 


s/n  fQ(u)  ( Q(u)  - Q(u)} 
u(l  - u)  , 
f(x)  = F'(x) 


fQ(u)  = f(Q(u))  . 

We  call  fQ(u)  the  density- quantile  function. 

Estimating  the  fQ-function  is  of  interest  for  two  reasons:  as  a way 
of  estimating  (1)  the  true  probability  density  function  f(x),  and  (2)  approxi- 
mate confidence  intervals  for  Q(u)  and  especially  for  the  true  median 
Q(0.  5)  , since 

Q(0.  5)  ± { ^n  fQ(0.  5)}~ 1 

is  an  approximate  95%  confidence  interval  for  the  median  Q(0.  5)  . 

We  call  q(u)  = Cf  (u)  the  quantile -density  function.  The  identity 

FQ(u)  = u 

implies  the  reciprocal  relationship 


fQ(u) 


q(u)  = 1 . 
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L* 


Thus  we  may  write  {using  = to  denote  approximate  equality) 


hr  * S'0-51”  - ^Q(°.75)  - 0(0.2511 


fQ(0.  5) 


We  define,  for  0 < p s 0,  5 , 


R(p)  = Q(l-p)  - IP) 


to  be  the  p-range,  and 


R(P)  = Q(1 -P)  - Q(P) 


to  be  the  sample  p-range.  When  p = 0.  25  , we  call  Q(0.  75)  and  Q(0.  25)  the 
quar  tiles. 


R(0.  25)  = Q(0.  75)  - Q(0.25) 


the  quar  tile -range,  and 


R(0.  25)  = Q(0.  75)  - Q(0,25) 


the  sample  quartile-range. 
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One  can  conclude  that  the  median  Q(0.  5)  has  a non-parametric  estimator 
given  by  Q(0.  5),  and  an  approximate  95%  confidence  interval  given  by 

Q(0.  5)  t 2R(0.  Z5)/JZ 

A use  of  a confidence  interval  of  this  kind  for  the  median  is  discussed  by 
McGill,  Tukey,  and  Larsen  (1978), 

The  aim  of  the  foregoing  discussion  is  to  introduce  the  quantile 
function  and  illustrate  how  it  is  traditionally  used  to  provide  non-parametric 
measures  of  location  (such  as  the  median)  and  scale  (such  as  the  quartile 
range).  Our  aim  is  to  use  quantile  functions  to  detect  and  describe  ideal 
and  non-ideal  statistical  models  for  data. 

3.  Location  and  Scale  Estimation  by  Window  Estimators 

One  of  the  points  which  this  paper  would  like  to  make  is  that 
measures  of  location  and  scale  of  a data  sample  are  interpretable  only  if 

I 

they  are  probability  based  , in  the  sense  that  they  are  estimators  of  char- 
acteristics of  the  true  quantile  function  of  the  random  variable  X. 

* * 

We  use  p and  a to  denote  measures  of  location  and  scale  respective- 

1 

ly.  When  p and  a represent  median  and  inter-quartile  range,  p = Q(0.  5) 

2 

and  0 = Q(0.  75)-  Q(0.  25).  When  p and  a represent  mean  and  variance, 
they  can  be  expressed  in  terms  of  Q by 


p = Jo  Q<u)du  * 


o2  = jJ  {Q(u)  - pj2du  . 


These  formulas  follow  immediately  from  the  basic  fact  that  X is 
identically  distributed  as  Q(U)  where  U is  uniformly  distributed  on 
the  interval  [ 0,  1 ] . 

2 

When  n and  a represent  mean  and  variance,  fully  non- parametric 
estimators  of  pi  and  a 2 are 

ti  = Q(u)  du  , <?  = {Q(u)  - p}2  du  , 

which  are  essentially  the  sample  mean  and  the  sample  variance. 

To  efficiently  estimate  location  and  scale  parameters  |i  and  a , 
it  is  customary  to  start  with  a model  for  the  probability  density  function 
f(x)  of  the  form 

'M  = i 'o  <*» 

where  fg(x)  is  a known  probability  density  function.  Define  L(p,  a)  to 
be  (1/n)  times  the  log  - likelihood  of  the  sample  X,,...,X  ; it  is 

given  by 

n X.-H 

L(H#cr)  = -log  a +-  L log  fQ  ( — — ) 

One  can  express  likelihood  in  terms  of  quantile  functions: 


L0i,O)  = -log  0 + log  fQ  ( 


a 


) du  . 
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The  model  (*) 
quantile  function  Q(u) 


leads  to  a very  simple  formula  for  the  true 
of  the  data; 


Q(u)  = p + o Qq(u) 


where  Qq(u)  is  a known  quantile  function  corresponding  to  fQ(x)  . For 
ease  of  writing  we  introduce  the  notation 


°o(u)  ■ 


The  maximum  likelihood  estimators  (a  and  a satisfy  the  log 
likelihood  - derivative  equations : 


A A.  A A 

—-Lfti.o)  = 0 . L(ia,  a)  = o 

oM  eo 


To  compactly  write  formulas  for  these  derivatives,  define 


*(x)  = 


-v  <*> 

f0  (x) 


lop  fo(x) 


w(x) 


The„  -ttmm.o  = i;0S(Q0wi' 


l r 1 


J0  (Q(u)  - p}  w(Q0(u))du 


= -^  + "7J,01  V(Q0(u))  (Q(u)  - p}du 


= w(Q^u^  £Q^u) ' ^ du 


In  the  normal  case,  ty(x)  = x , w(x)  = 1 and  p and  a are 
equal  to  the  sample  mean  and  variance  respectively. 

A A 

To  obtain  estimators  p and  a without  specifying  fQ(x)  » one 

2 

introduces  the  concept  of  iteratively  reweighted  estimators  of  p and  a 

$ $ 

Given  estimators  p and  a define 


V(tt>  ■ 

a 


Then  as  "approximate"  solutions  of  the  log -likelihood  derivative  equations. 


one  studies  the  estimators  defined  by 


Jjj  Q(u)  w(QQ'’(u))du 
j^1  w(QQ*(u))  du 


02  = Jq  {Q(u)  - p}^  w(C^  (u))  du 


2 ~ * 


A A 

We  call  this  weight  function  a window,  and  we  call  p.  and  a window 
estimators. 

To  completely  specify  the  window,  one  must  specify  a value  for  m 

( 

(which  we  could  call  the  "trimming  width"  of  the  window).  The  more  normal  t 

the  data  is  believed  to  be,  the  larger  should  m be  chosen  (say,  m = 25)  . 

» 

The  more  Cauchy- distributed  the  data  is  believed  to  be,  the  closer  to  1 > 

I 

should  m be  chosen  (say,  m = 4)  . In  practice,  one  might  try  both 
values  of  m , and  compare  the  results.  The  constant  m could  also  be 
estimated  adaptively  to  yield  "self-tuning"  robust  estimators  of  location 
and  scale. 
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A window  recommended  by  Tukey  (see  Mosteller  and  Tukey  (1977), 
p.  205)  is  the  bisquare  window; 


w 

Bisquare 


(x) 


«1  - <?)2) 


2 

+ 


where  c is  a suitably  chosen  constant.  Tukey  recommends  that  c be 
taken  to  be  6 or  4 when  x is  measured  in  units  of  a . It  seems  likely 
that  the  choice  of  c should  reflect  one's  beliefs  about  the  long -tailed 
character  of  the  data. 


4,  Weight  Functions  of  L-Estiniators 

A 

An  L-estimator  p of  a location  parameter  is  a linear  combination 

of  order  statistics  X.,  ,<...<  X.  . , which  we  write  in  the  form 

(1)  (n) 

P = /q1  Q(u)  W (u)  du 

for  suitable  weight  function  W(u)  . Asymptotically  efficient  L- estimators  of 
p and  0 in  the  model  Q(u)  = p + a QQM  . when  fQ  is  a symmetric 
density,  are  given  by  [see  Parzen  (1978),  and  summary  in  section  8] 


■< 
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where 


H = Jo  Q(u)  w^(u)  du  + Jq  W^(u)  du 

a = Jo  Q (u)  wa(u)  du  + Jq1  wa  (u)  du 


Jo’(u)  . 

VU)  = Wu)  Jo,(u)  = and 


Wo(u)  = JQ(u)  + Qq(u)  vyu)  . 


f0Q0(u)  is  the  density-quantile  function  corresponding  to  , and  Jq(u) 

is  its  score  function  defined  by 


Vu> 


-V  Q0(u) 

■(foQo^u>  = ~f'0Q0(uT  *(Qo(u)) 


An  L-estimator  forms  a weighted  average  of  order  statistics  in 
which  the  weights  depend  on  the  ranks  u . It  is  of  interest  to  express  the 
weights  as  a function  of  Q^(u)  » which  is  the  size  of  the  order  statistics. 
One  can  derive  such  formulas  starting  from  the  general  representation, 
given  by  Parzen  (1978), 


Wu)  ~ 


a - «r 


Qo<u> 


~ (1  - u) 


-«x-l) 


where  a , called  the  tail  exponent,  is  assumed  to  satisfy  a>  1 (indicative 
of  long  tailed  distributions).  We  write 


Therefore 


w,  * -T0  'oQo 


(f0Q0)2[(log  f0Q0)"  + t(log  f0Q0)'}  ] 


w0  = Jo  + Qow  * -(foQo,(log  foQo»'  + Qwn 


w (u)  ~ (l  - u)2*0'11  all  - a)  - — all  - a) 

* °o,u) 


WQ(u)  ~ (l  - U)a_1  a(2  - a)  ~ — ~ ■ a(2  - a) 


The  main  conclusion  we  desire  to  point  out  is  that  if  one  expresses 
W^(u)  as  a function  w of  Qq(u)  * 


W (u)  = w(Qq(u))  , 


then  for  long-tailed  distributions.  w(x)  ~ -2  . By  writing  W as  a function 

of  Qq  , to  an  L-estimator  one  can  form  an  equivalent  iteratively  reweighted 
estimator. 


Given  preliminary  estimators  p and  0 , form 


V(U) 


(u)  - P 


and  define 


Jo  w<%*(u))  Q(u)  du 


^ w(Qq  (u))  du 
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This  estimator  is  a weighted  average  of  Q with  weights  a function  only 

~ * 

of  the  size  of  the  standardized  residuals  (u)  . 

For  Student's  t-distribution  with  m degrees  of  freedom. 


Vu)  = * (Q0(u))  = 


m + 1 


Q0(») 


m 1 + i q02(u) 


Consequently 


WH(U)  = VQ0(u))  * Wa(u)  = wa(Q0(u)) 


with 


. . m + 1 1 - (x  /m)  , v m + 1 2x 

W.  .(x)  = — — ^ f-  , W (x)  = — 

M m [1  + (x2/rr)]2  0 m [l  + (x2/m)] 


These  windows  deserve  further  investigation.  However  they  appear  to 
support  the  recommendation  that  robust  estimators  of  location  and  scale 
may  be  obtained  from  preliminary  estimators  p and  0 by  the 
formulas  (for  a suitably  chosen  value  of  m ) 


Jo1  °<u>  >z)~ld" 

G 


J2  i>  (“’Wr1 


JO  ' m ' * 

o 


du 


L 
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5.  Variance  and  Influence  Functions  of  Window  Estimators 

This  section  presents  a conjectured  formula  for  the  asymptotic 
variance  of  a window  estimator  which  is  derived  by  representing  it  as  an 
L-estimator 

a Jq  w(Qq(u))  Q(u)  du 

Jo  W<Q0(U))  du 


12-1 

where  w(x)  = (1  + — x ) The  question  of  deriving  the  theory  of  Ji  as 

A 

an  M-estimator  is  open  for  research;  p is  an  M-estimator  if  it  satisfies 


I ...  .OM  - E 


Jo  V( 


) du  = 0 


for  a suitable  i|i  function,  here  chosen  to  be 


*(x)  = 


1 + (x  /m) 


Under  the  assumption  that  the  true  quantile  function  is  of  the  form 
Q(u)  = n + aQQ(u)  , and  that  QQ(1  - u)  = -QQ(u)  , signifying  a symmetric 
distribution,  we  seek  to  find  the  variance  V of  the  asymptotic  distribution 

A 

of  «/n(|i  - |i)  , which  is  normal  with  zero  mean  and  asymptotic  variance  V 
From  the  asymptotic  distribution  theory  of  L-estimators 

Ip1  |vM2  du 

[Jo1  wQ0(u)du}2 


V 


where 


defining 


V'(u)  = w(Qq(u))  q(u)  = a w(Qq(u))  qQ(u) 


V(u)  = 0 v(Qq(u))  , 


Wx)  = /x  w(y)  dy  = v/m  tan  * (x/^m) 


Further  v(x)  is  the  influence  function  of  the  estimator  (Huber  (1977),  p.  17). 
Note  that  for  fixed  x , v(x)  -»  x as  m -*  » . The  formula  for 

A 

the  variance  of  the  robust  estimator  ^ may  be  written  explicitly 


a 2 J*1  tan  1 (Q  (u)/Jm)}2  du 

Var(M)  = 2-  r 

do1  <i+L°o w ^ 


and  can  clearly  be  regarded  as  a generalization  of  the  traditional  formula  for 
the  variance  of  the  sample  mean.  It  is  derived  under  the  assumption  of  a 
symmetric  but  possibly  long-tailed  distribution. 

A A 

To  estimate  Var(H)  in  practice,  one  might  replace  a by  a and 


a#  A A 


Qg(u)  by  (Q(u)  - n)/c  if  a Quantile -Box  plot  of  Q(u)  - u indicates  that 
it  is  symmetrically  distributed  about  0 . 


It  should  be  noted  that  under  the  model  Q(u)  = \J  + a Qq(u)  > with 
A 1 | 

Qq(1  - u ) = -Qq(u)  , u estimates  J0  w(Qq(u))  Q(u)  du w(Qq(u))  du  = u 


while  estimates  o*  JQ*  w(QQ(u))  (u)  du  = Q1  Q 2(u)  (1  + ^ (u))^  du 


1 ~2,  ..2 
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6.  Non- parametric  Probability  Law  Modeling 

To  interpret  (as  well  as  to  form)  location  and  scale  parameters 

estimators  from  a data  batch  X. , . . . , X one  must  model  its  probability 

In 

law.  This  section  briefly  mentions  some  new  approaches  which  are 
currently  being  developed  for  non- parametric  probability  law  modeling 
(see  Parzen  (1978)).  They  all  involve  both  graphical  and  numerical  analysis 

~ A 

of  the  sample  quantile  function  Q to  find  smoothing  functions  Q . 
Quantile  Box- Plots  are  introduced  in  the  next  section. 

Quantile  Residual  Brownian  Bridge  Test.  To  say  that  the  true 
quantile  function  Q(u)  obeys  the  hypothesis  : Q(u)  = p + 0Q^(u)  is 

A A A /V  A 

to  say  that  one  can  find  values  p and  a such  that  Q(u)  = p + aQ^(u) 

~ A 

fits  Q . The  fit  of  Q to  Q can  be  judged  by  displaying  the  quantile 
residuals 


R(u)  = f0Q0(u)  { Q(u)  - Q(u)}  , Osusi 

1 I 

r 

where  f„Q„(u)  = f„(Q„(u))  is  the  density-quantile  function  corresponding 

U U U U t 

I t 

to  Fq  . Under  the  null  hypothesis  {Jn/o)  R(u)  , 0 £ u £ 1 is  i 

asymptotically  distributed  as  a stochastic  process  B(u)  , 

0 £ u £ 1 which  is  a modified  Brownian  Bridge  process  in  the  sense  that 
its  covariance  kernel  E(B(u^ )B(u^))  is  not  min  (u^.u^)  * U]U2  ^ut  *s 
modified  due  to  the  estimation  of  the  parameters  p and  a . To  test 
whether  the  sample  path  R(u)  looks  like  a sample  path  from  a modilied 

« 

Brownian  Bridge  process  one  could  use  various  functionals 

J 


20 


whose  asymptotic  distribution  is  known  from  their  role  in  the  conventional 
theory  of  Goodness  of  Fit  Tests.  The  sample  process  traditionally  con- 
sidered for  goodness  of  fit  tests  is 

A A 

DQ(u)  = F0((Qu)  - u)/a) 

A 

To  estimate  c (needed  in  the  asymptotic  distribution  of  R(u))  one 
could  use  a non- parametric  estimator  such  as 

°0  * So  =I0lJ0(»)Q(u)du  . 

A A A 

To  estimate  U and  a needed  to  form  Q(u)  , one  could  use  quick 
~ * ~ # 

and  dirty  estimators  U and  a formed  from  Quantile  Box- Plots,  or 
one  could  use  asymptotically  efficient  estimators  formed  from  regression 
analysis  of  the  continuous  process  Q(u)  (see  section  8). 

Cumulative  Weighted  Spacings  Brownian  Bridge  Tests  . To  test 
whether  the  true  quantile  function  Q(u)  is  of  the  form  Q(u)  = |-i  + aQ^(u)  , 
one  need  not  first  estimate  p and  0 . Instead,  following  Parzen  (1978), 
form 

D(u)  = £ JQU  fQQ0(t)  d Q(t)  . Osusi  , 

ao 

which  is  an  estimator  of 

D(u)  = ~ f0QQ(t)  dQ  (t)  0 £ u * 1 defining 


°0  = Jo  f0Q0(u)dQ(u)  ’ Under  the  null  hypothesis,  D(u)  = u , and  it  is 
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conjectured  that  (D(u)  - u]  , 0 s u s 1 is  asymptotically  distributed 

as  a Brownian  Bridge  Stochastic  process. 

By  suitably  choosing  the  null  hypothesis  Hq  and  the  standard 
density- quantile  function  fQQ0  , one  can  test  the  goodness  of  fit  of  any 
specified  probability  law  (normal,  exponential,  Weibull,  Cauchy,  etc.  ) to 
the  data. 

Density- Quantile  Function  Autoregressive  Estimation.  Parzen  (1978) 

A 

discusses  autoregressive  estimators  d(u)  of 


, Wu> 

d(u)  = D (u)  - ^ fQ(u) 


which  can  be  used  to  form  estimators  of  fQ(u)  . 

The  density  quantile  function  fQ  can  be  estimated  also  by  forming 

A 

autoregressive  smoothers  Dq(u)  of 


D0(u)  = F0((au)  - u)/0). 


with  density 


d (u)  = fQ  ((Q(u)  - u)/o)  q(u)/0  . 


A 

The  autoregressive  density  d^fu)  = *s  an  estirnator 


fQ((Q(u)  - p)/*) 


d0(u)  fQ(u)  a 
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7.  Quantile  Box- Plots  Diagnostic  Measures 

Given  a data  batch  X, , . . . , X , a successful  approach  to  "display" 

1 n 

of  the  data  has  been  the  box  plot  introduced  by  Tukey  (1977).  Five  values 
from  a set  of  data  are  conventionally  used:  the  extremes,  the  upper  and  lower 
H-values  (H  is  an  abbreviation  for  hinges  or  quartiles),  and  the  M-value 
(median).  The  basic  configuration  of  the  box-plot  display  is  a vertical  box 
of  arbitrary  width  and  length  equal  to  the  distance  HH  (defined  as  upper 
H- value  minus  lower  H-value  and  called  the  H- spread).  A solid  line  (called 
the  M-line)  is  marked  within  the  box  at  a distance  MH  above  the  lower  end 
of  the  box  (MH  equals  M minus  lower  H).  Dashed  lines  are  extended  from 
the  lower  and  upper  ends  of  the  box  a distance  equal  to  the  distance  of  the 
extremes  from  the  hinges.  If  one  wants  to  indicate  a confidence  interval 
for  the  median,  one  might  add  a line  perpendicular  to  the  M-line  at  its 
midpoint,  and  of  length  ± HH/Vn  . The  box-plot  described  should  be  called 
an  H-Box  Plot,  because  by  replacing  H-values  by  other  types  of  values 
(called  E- values  and  D-values)  one  can  consider  E-Box  Plots  and  D-Box 
Plots. 

The  H-values  are  most  conviently  defined  as  Q(0.  25)  and  Q(0.  75)  , 
the  1/4  percentiles.  The  E-values  are  the  1/8  percentiles  Q(0. 125)  and 
Q(0.875)  . The  D-values  are  the  1 / 1 6 percentiles  Q(0.  0625)  and  Q(0.9375)  . 

The  mid- summaries  of  a data  batch  are 


p(p)  = j [ Q(1  -p)  + Q(P)1  . 


0 s p s 0.  5 . 


Of  particular  interest  are 


SM  = 4(0.5)  , 4h  = 4(0.25)  , 4e  = 4(0.125)  , 

Md  = p (0.  0625)  . 

When  Hq:  Q(u)  = 4 + 0 QQ(u)  holds,  and  QQ(1  - u)  s -QQ(u)  • 

|j  is  an  approximately  unbiased  estimator  of  p 

The  average  of  the  extreme -values  of  the  sample  will  be  denoted 

a*  +* 

p (0)  . The  closeness  of  4(0)  to  the  other  p values  may  indicate  whether 
the  data  batch  has  a short  tailed  symmetric  distribution  such  as  the  uniform. 
The  mid-  spreads  of  a data  batch  are 

S(p)  = Q(l-p)  - Q(p)  , 0SpS0.5  . 

Given  a specified  standardized  quantile  function  Q^,  the  mid-scales 
are  defined  by 


o(p)  = S(p)+S0(p) 

where  SQ(p)  = QQ(l-p)  - QQ(p)  is  the  mid-spre3d  of  QQ  . When  HQ 

A* 

holds,  0 (p)  is  an  approximately  unbiased  estimator  of  a . Of  particular 
interest  are 


C (0.25)  . 


a (0.125)  , aD  = a(0.  0625)  . 
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Quick  and  dirty  estimators  of  p and  a are  given  by 


M*  = 4 + hi  + UE  + WDJ 


^ 1 r ~ ~ ^ , 

ff*  * 5 !oH  + 20E  + 2C,D3 


Diagnostic  tests  for  the  validity  of  HQ  are  obtained  by  testing  for 
the  equality  of  the  various  jj  and  a values.  More  quantitative  diagnostic 
measures  could  be  defined  as  follows: 


SKEW(p)  = - d(p)]  +S(p) 

TAIL(p)  = log  CS(p)  t-S(0.  25)} 

TAIL0(P)  = log  CSQ(p)  4-  Sq(0.25)} 
TAIL$(p)  = log  {#_1(p)  -s-§_1(0.  25)] 


When  SKEW(p)  is  not  significantly  different  from  zero,  we  consider 
the  data  batch  to  have  a symmetric  distribution. 

When  the  data  passes  a SKEW  test  for  symmetry,  it  is  checked  for 
normality  by  comparing  TAIL(p)  with  TAIL$(p)  : TAIL(p) 
significantly  larger  than  TAIL$(p)  indicates  a long-tail  distribution,  and 
TAIL(  p)  significantly  shorter  than  TAIL$(p)  indicates  either  a short- 
tailed distribution  (especially  a uniform)  or  possibly  a bimodal  distribution. 
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A seven-number  summary  of  a data  batch  is  provided  by  its  M,  H, 

E and  D values,  which  suffice  to  compute  mid- summaries,  mid-scales, 

SKEW,  and  TAIL  measures.  To  find  re-expressions  (transformations)  of 
the  data  which  make  it  more  normal,  one  needs  only  the  seven-number  summary 
of  the  re-expressed  data  batch  which  are  easily  found  as  re-expressions  of 
the  seven-number  summary  of  the  original  data  batch. 

In  addition  to  the  analytical  measures  of  the  data,  one  should 
form  a graphical  display  of  the  quantile  function  Q (u)  as  a function  on  the 
unit  interval  O^uSl  ; the  H,  E,  and  D boxes  are  drawn  superimposed. 

Quantile-Box  Plots  enable  the  investigator  to  detect  "non-ideal" 
aspects  of  data  batches  by  testing  the  data  for  normality  by  tests  which 
determine  the  directions  in  which  data  fails  to  be  normal,  such  as  (1)  long- 
tailed distribution,  (2)  outliers,  (3)  bimodal  distribution,  (4)  non- symmetric 
distribution. 

To  check  for  symmetry,  inspect  the  shape  of  Q(u)  within  the  boxes,  as  well 
as  compare  mid- summaries  and  examine  the  SKEW  diagnostic  measures. 

When  the  data  passes  the  test  for  symmetry  the  question  of  whether 
it  has  a normal  or  long-tailed  distribution  is  decided  using  the  TAIL  diag- 
nostic measures.  Small  TAIL  values  may  indicate  bimodal  distributions. 

Data  sets  with  outliers  may  also  yield  small  TAIL  values. 

If  the  graph  x = Q(u)  has  points  with  sharp  rises  ("infinite"  slopes), 
then  the  probability  density  has  a zero  and  will  therefore  have  two  (or  more) 
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modes.  If  the  points  of  sharp  rise  lie  inside  the  H-Box  we  suspect  the 
presence  of  several  distinct  populations  generating  the  single  data  batch.  If 
points  of  sharp  rise  lie  outside  the  E-Box,  we  suspect  outliers  (values  to  be 
discarded  for  robust  estimation). 

A mode  in  the  probability  density  function  is  indicated  in  the  graph 
x = Q(u)  by  a point  of  inflection  (with  "finite"  slope).  A horizontal  segment 
in  the  graph  is  interpreted  to  mean  a very  large  probability  density  there. 


8.  Location  and  Scale  Parameter  Estimation  as  Regression  Analysis 

of  Sample  Quantile  Process 

One  can  consider  estimators,  denoted  u and  a , which 

P.  q P. q 

use  the  sample  quantile  function  Q(u)  , p s u s q ; this  is  equivalent  to 

using  a restricted  set  of  order  statistics  X.  X,  , or  a trimmed 

(np)  (nq)  

sample.  A compact  derivation  of  such  formulas  is  given  by  Parzen  (1978)  who 
gives  the  representation 


r 

-1 

V, 

i 

PP 

I 

pa 

T 

p-p,  q 

e 

I 

I 

T 

p.q 

pa 

0,p,  q 

^ J 
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where 

T = fq  W (u)  Q(u)  du  + Q(p)  W (p)  + Q(q)  W (q) 

p,p.  q Jp  p 

T = fq  W (u)  Q(u)  du  + Q(p)  W (p)  + Q(q)  W _(q) 

p#  q «p  o 0 Li  ui\ 

lw  =.  £ Wp,u,  du  + WuL(p)  + WMR(q, 

U ' Ip’  Vu)  d“  + 'V(p)  + WoR  ,q) 

• * 4“  W0(U)  Q0<U)  dU  + W.L(pl  Q0(P|  + W0R(q)  Q0(q> 

The  weight  functions  are  expressed  in  terms  of  the  density-quantile 
function  f0Q0(u)  = f0(QQ(u))  and  the  score  function 

-fo'IF’V)) 

J (u)  = -(f  Q ) ' (U)  = : = ♦(Q0(u))  . 

f0(F0  (u) 
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Vu)  ' Jb|u)foQo,u) 


WQ(U)  = JQ(U)  + Q0(u)  W^u) 


WML(P)  = WP> 


j WP>  * J0IP,j 


V*'1’  ' ‘oQo(ql 


ITJ  foQo,p»  - J0,pl 


WCL(P’  " Q0(P»  V<Pl  - f0°0IP) 


V ’ Q0(q)  W|JR|P)  + f0Q0(q) 


For  normally  distributed  data, 

fQ(x)  = *(x)  = e‘(1/2)x  , F0(x)  = i(x)  = £* 

f0Q0(u)  = exp  -£|«"1(u)|2  . 

j0(u)  = 1 (U)  , j|j(u)  = urVn'1  , 

W(u)  = 1 , W (u)  = 2 i‘'(u)  . 

|i  0 

wit  (P)  = 0$_1(p)  C~  0*“ 1 (p)  + *-1(p)1 


0(y)  dy  , 
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W0R(P)  = $_1(P)  W^^p)  - 0$_I(p)  . 

When  q = 1 - p , I = 0 , 

W = 1 ■ 2p  + 2VWP) 

Igp  = 2 Jpq  1(u)l 2 du  + 2*_1(p>  wOL(p) 

The  estimator 

r*"P  Q(u)  du  + W (p)  {Q(P)  + Q(1  . p)} 

Q _ J_£ 

P.q  1 - 2p  + 2W  (p) 


is  similar  to  the  Winsorized  mean  (with  trimming  proportion  p ). 


Robust  Maximum  Likelihood  Estimation  of  Mean  and  Variance  of  a 
Normal  Distribution.  We  may  be  willing  to  assume  that  our  data  is  more 
normal  than  longtailed,  but  the  shape  of  the  true  distributions  is  deviating 
slightly  from  the  assumed  normal  model  due  to  "wrong"  values  in  the  data 
set.  We  propose  the  following  exploratory  data  analysis  for  robust  estimation 
of  p and  a from  normal  data  with  possible  "outliers."  We  suggest  the 
name  "robust  maximum  likelihood  estimators  " for  these  estimators. 

For  selected  value  s of  p (at  least  p = 0.  05  , 0.25,  and  0.45)  , 


(1)  compute  the  estimators  p and  a , 

' ' r p,  1-p  p,  1-p 


, and  (2)  plot  the  residuals. 
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Q(u)  - Q(u)  = Q(u)  - Hp>1.p  - . 

multiplied  by  0$-1(u)  ..  Their  values  over  the  interval  p Su  S 1 - p can 
be  used  to  test  the  hypothesis  Hq  . The  residuals  over  the  tail  intervals 
u s p and  u a 1 - p can  be  used  to  test  for  the  presence  of  "wrong  values.  1 
One  estimates  U and  a by  those  estimators  corresponding  to  the  lowest 
value  of  p for  which  one  finds  no  "wrong  values"  over  the  tail  intervals. 


9.  Robust  Regression 

Formulating  the  estimation  of  location  and  scale  parameters  as  a 
problem  of  weighted  regression  of  the  sample  quantile  function,  with  weights 
a function  of  Qq(u)  » leads  to  the  amazing  conclusion  that  asymptotically  ef- 
ficient estimators  of  gi  and  a are  obtainable  numerically  by  iterating 
ordinary  regression  calculations! 

A A 

The  iteratively  reweighted  estimators  (i  and  0 are  also  the 
solutions  to  the  problem  of  estimating  u and  0 in  the  following  weighted 
least  squares  linear  regression  problem: 


X.  = U + C. 
J J 


where  £.  are  independent  normal  with  mean  0 and  variance  satisfying 


2 1 * * 
Var< V = 0 ■ wj  ■ w(ej  > • ej  = 


(*> 


w. 

J 
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r 


A regression  model  can  be  written 


Y.  = P,  X,  , + ...+  fix..  + e. 
J rl  lj  ^ Kj  J 


where  {e.}  are  independent  random  variables  with  quantile  function  known 
up  to  a parameter  o 


Q£(u)  = a Qq(u) 


To  robustly  estimate  the  coefficients  (3  j , . . . . > o assume  first 

Q^(u)  = $_1(u)  , corresponding  to  normality,  and  by  ordinary  least  squares 


linear  regression  obtain  preliminary  estimators  pj 
form  residuals 


p,  ; then 
k 


C.*  = (Yj-d/Xyt...  t C Xk.)  + 0* 


The  next  stage  of  estimators  P , . . . , , o are  taken  to  be  the  least 

squares  linear  regression  estimators  under  the  assumption  that  e.  have 
variances  defined  by  (*)  . This  process  is  iterated  to  yield  robust 
estimators  (compare  Huber  (1977),  p.  38,  Algorithm  W). 

The  long-tailed  character  of  the  residuals  should  also  be 

examined,  using  Quantile- Box  plots. 

One  might  consider  non-parametric  non-linear  regression  of 
Y on  X «...  X,  . A density- quantile  approach  to  non-linear  non- 

1 k 
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parametric  regression  has  been  described  by  Parzen  (1977),  and  is 
currently  being  investigated  by  Prof.  J.  P.  Carmichael.  It  has  been 
applied  to  time  series  analysis  by  Prof.  M.  Pagano.  It  provides  means 
of  checking  whether  robust  estimation  of  variances  and  correlations  is 
provided  by  robust  estimation  of  linear  regression  coefficients. 


10.  Do  we  need  a new  definition  of  Statistics? 

Can  statistics  be  made  a subject  that  provides  intellectually 
exciting  pastimes  (for  the  young  and  the  mature),  is  regarded  as  relevant 

by  the  creative  scientist,  and  is  appealing  as  a career  to  the  mathematically 
talented? 

An  important  step  in  achieving  these  desirable  (and  I believe 
attainable)  goals  is  to  alter  the  perception  of  the  sample  mean  and  the  sample 
variance  in  elementary  statistical  instruction.  Introductory  Statistics 
is  regarded  by  almost  all  college  students  (even  by  mathematically  talented 
students)  as  a very  dull  subject.  Perhaps  one  reason  is  that  students  enter 
the  course  knowing  about  a mean  and  a variance  and  leave  the  course 
knowing  only  about  a mean  and  a variance.  That  statistics  is  in  fact  a 
live  and  vibrant  discipline  can  be  communicated  to  the  student  by  emphasizing 
that  there  are  many  ways  to  estimate  mean  and  variance,  and  more  generally 
location  and  scale  parameters. 


p 
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I believe  that  the  discipline  of  statistics  can  be  made  more  "glamorous" 
if  intellectually  sound  and  demonstratively  useful  concepts  of  statistical 
data  analysis  and  robust  statistical  inference  are  incorporated  in  introductory 
statistical  instruction.  The  perspective  which  this  paper  proposes  for 
interpreting  robust  statistical  inference  is  equivalent  to  a proposal  for  the 
definition  of  statistics; 

"Statistics  is  arithmetic  done  by  the  method  of  Lebesgue 
integration.  " 

I realize  this  definition  sounds  unbelievable  and  may  never  sell 
to  the  introductory  student.  But  at  least  statisticians  should  understand 
to  what  extent  it  is  true.  Perhaps  it  provides  a basis  for  a new  sect  of 
statisticians. 

Can  we  all  agree  that  a basic  problem  of  statistics  is  an  arithmetical 
one:  find  the  average  X of  a set  of  numbers  X^ , . . . , X^  ? Even 
grade  school  students  (in  the  U.  S.  A.  ) nowadays  know  the  answer; 

X = - (X,  + X,  + . . + X ) . 

n 1 2 n 

In  words:  list  the  numbers,  add  them  up,  and  divide  by  n . What  should 
be  realized  is  that  the  foregoing  algorithm  is  the  method  of  Riemann 
integration. 

The  method  of  Lebesgue  integration  finds  X by  first  finding 

the  distribution  function  F(x)  of  the  data,  defined  by  F(x)  = fraction 

of  X, , . . . , X £ x , -«  < x < “ . Then  X is  found  as  the  mean  of 

1 n 

this  distribution  function,  defined  by  the  integral 
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X = I_I  x dF(x)  • 

To  use  an  analogy  to  count  a sack  of  coins,  first  arrange  the  coins  in  piles 
according  to  their  value  (pennies,  nickels,  dimes,  quarters,  and  half- 
dollars),  then  count  the  number  of  coins  in  each  pile,  determine  the  value 
of  each  pile,  and  finally  obtain  X as  the  sum  of  the  values  of  the  piles, 
divided  by  n . The  role  of  statistics  is  to  find  more  accurate  estimators 

A w 

of  the  true  mean  by  fitting  a smooth  distribution  function  F(x)  to  F(x)  • 
Still  more  insight  (and  fidelity  to  the  truth)  is  obtained  by  displaying 
the  sample  quantile  function  Q(u)  = F (u)  = inf  (x : F(x)  s u}  , and 
fitting  smooth  quantile  functions  Q(u>  to  Q(u|  . Then  one  computes 
the  "sample  average"  by 


H = jJcHuJdu  = ^ W^(u)  Q(u)  du -r  W^(u)  du  . 


In  words,  the  "average"  of  a sample  is  a weighted  average  of  the  numbers 
in  the  sample  arranged  in  increasing  order,  with  the  weight  of  a number 
depending  on  its  rank.  This  is  the  essence  of  robust  statistical  data  analysis 
all  the  rest  is  commentary. 
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Appendix 


EXAMPLES  OF  QUANTILE- BOX  PLOTS 

TIPPETT'S  WARP  BREAK  DATA  (Compare 
box  plots  in  McGill,  Tukey,  Larsen  [1978]). 

Fossil  data  from  yellow  Limestone  formation  of  north- 
western Jamaica  (from  Chernoff,  H.  (1973),  "The  Use  of 
Faces  to  Represent  Points  in  k- Dimensional  Space  Graphically,  " 
Journal  of  the  American  Statistical  Association,  68,  361-368). 

Variables  2 and  6 have  zeroes  in  fQ  (rises  in  Q ). 
Variables  3 and  4 have  proability  masses  (flat  stetches  in  Q ). 
Variables  1 and  5 are  candidates  for  re-expression  (logarithm 
for  1,  square  root  for  5).  Variables  2 and  5 suffice  to  classify 
the  observations. 


JRMR ICR  LIMESTONE 

SSlOINflt  S 


VRR  1 

35*T(X) 


3 LOOfX) 


JECUP'TY  CL  ASSiFIC  AT.ON  Oc  ■'M  S o*'-F  '»>»n  Oaf*  Fnrarad' 


w.  DISTRIBUTION  STATEMENT  (a!  (A#  obottoct  onlktod  In  Block  30.  II  dlllmronl  tram  htoport) 


NA 


• ».  SUPPLEMENTARY  NOTES 

The  findings  in  this  report  are  not  to  be  construed  as  an  official 
Department  of  the  Army  position,  unless  so  designated  by  other  authorized 
documents. 
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~ This  paper  provides  an  overview  to  a new  general  approach  to  statistical 
data  analysis  and  parameter  estimation  which  could  bo  called  the  quantile 
function  approach.  The  aims  of  descriptive  statistics  (to  graphically 
summarize  and  display  the  data)  are  obtained  by  Quantile- Box  plots  of  the 
sample  quantile  function.  The  aims  of  "goodness  of  fit^are  obtained  by 
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^fitting  smooth  quantile  functions  to  the  sample  quantile  function.  The  aims 
of  parameter  estimation,  especially  robust  estimation  of  location  and  scale 
parameters,  are  attained  by  regression  analysis  of  the  sample  quantile 
function.  (The  goal  of  a statistician  in  analyzing  a batch  of  data 
Xj, . . . , should  be  both  "estimation  of  parameters^and  "goodness  of  fit.  " 

By  "goodness  of  fit"  is  meant  fitting  of  the  observed  sample  probabilities  by 
a smooth  probability  law.  ) 

Quantile  functions  are  defined  in  Section  2.  Window  estimators  of 
location  and  scale  parameter  estimation  are  defined  in  Section  3;  their 
equivalence  to  L-estimators  is  discussed  in  Section  4.  A conjectured 
expression  is  given  in  Section  5 for  the  asymptotic  variance  of  window 
estimators.  New  approaches  being  developed  for  non-parametric  prob- 
ability law  modeling  are  mentioned  in  Section  6;  quantile  box-plots  are 
introduced  in  Section  7.  Section  8 discusses  location  and  scale  parameter 
estimation  using  trimmed  samples.  Robust  regression  is  the  subject  of 
Section  9.  A new  definition  of  statistics  is  proposed  in  Section  10. 
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